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Abstract 


The Solar Polar-orbit Observatory (SPO). proposed by Chinese scientists, is designed to observe the solar polar 
regions in an unprecedented way with a spacecraft traveling in a large solar inclination angle and a small ellipticity. 
However, one of the most significant challenges lies in ultra-long-distance data transmission, particularly for the 
“Magnetic and Helioseismic Imager (MHI), which is the most important payload and generates the largest volume 
tf data in SPO. In this paper. we propose a tailored lossless data compression method based on the measurement 
mode and characteristics of MHI data. The background out of the solar disk is removed to decrease the pixel 
number of an image under compression. Multiple predictive coding methods are combined to eliminate the 
redundancy utilizing the correlation (space, spectrum, and polarization) in data set, improving the compression 
ratio. Experimental results demonstrate that our method achieves an average compression ratio of 3.67. The 
compression time is also less thin the general observation period, The method exhibits strong feasibility and can be 


easily adapted to MHL 
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1, Introduction 


The solar polar regions are vital in controlling solar activity 
and driving space Weather, but remain as the least-known 
mysterious territory on the Sun. So far, some polar exploration 
activities have been carried out, such as Ulysses (Wenzel et al. 
1992) and Solar Orbiter (Miller et al. 2020), but there have 
been no frontal imaging observations on the solar polar regions 
(Nandy et al. 2023), The Solar Polar orbit Observatory (SPO) 
proposed by Chinese scientists will directly image the solar 
polar regions in an unprecedented way with a spacecraft 
traveling in a large solar inclination angle (280°) and a small 
ellipticity. By obtaining the magnetic field and flow field with 
high precision, and combining multiband remote-sensing and 
in situ measurements, SPO will make breakthrough on the 
following top-level scientific objectives: to unveil the origin of 
the solar magnetic activity cycle that shapes the living 
environment of human beings, 1o unveil the origin of the 
high-speed solar wind that connects the Sun and celestial 
bodies in the solar system, and to construct data-driven global 
heliospheric numerical models which serves as the foundation 
for space weather prediction (Deng et al. 2023). 

‘SPO will be equipped with nine payloads, of which the 
Magnetic and Heliseismic Imager (MHI) is the most critical 
payload, MHI will provide high-resolution and high-sensitivity 
measurements of the photospheric vector magnetic field and 


Doppler velocity with cadence of 15 and 1 minutes, respec- 
tively, However. due to the particular orbit of SPO, efficient 
Scientific data compression and transmission schemes are 
highly concemed. The frst period of north polar observation 
for SPO (235°) will generate about 40 Tb (about 23.1 Tb from 
MHL of scientific data, while the available data transmission 
capacity may be about 12 Tb. Limited by bandwidth, the data 
have to be compressed to less than 3.5 (Deng et al. 2023) 
‘Therefore, exploring high-performance data compression 
methods is of great importance for MHI. 

‘Compression algorithms for general purpose, such as 
Huffman coding. run-length encoding, and LZW encoding. 
are widely applied and can achieve high compression ratios, 
However, other factors including the absence of patent issues, 
low complexity, and simple hardware implementation also 
need to be considered for space payloads, Consequently. 
compression algorithms for space payloads typically adopt 
some international space compression standards as a blueprint 
to be further refined and improved upon. 

There are two distinct stages far the high-performance data 
compression in solar space exploration. In the early stage, limited 
by the technology. lossy compression with higher compression 
ratio was chosen ater weighing limited bandwidth, massive data 
transmission, and tolerable data distortion, For example, the 
Transition Region and Coronal Explorer (TRACE). Solar and 
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Heliospheric Observatory (SOHO), Solar Dynamics Observatory 
(SDO), and Hinode used high-quality lossy JPEG compression to 
reduce data transmission bandwidth dependence (Fischer et al. 
2016). In recent years, the lossless compression has been taken 
into consideration due to the rapid development of transmission 
technology and the relentless pursuit of high-resolution solar 
observations. Currently, most payloads refer to the Lossless 
JPEG image compression standard coding ofthe data compres- 
sion standard recommended by the Consultative Committee for 
Space Data Systems (CCSDS) data compression working group. 
The Lossless JPEG includes two typical subtypes: JPEG2000 
(TU-T 2002) and IPEGAS (ITU-T 1998), the Fulldisk 
“MagnetoGraph (FMG) onboard the Advanced Space-based Solar 
Observatory (ASOS) applies the JPEG2000 in the data 
compression (Deng et al. 2019). The CCSDS recommended 
standard adopts a two-dimensional discrete wavelet transform 
and slight plane coding for images (CCSDS 2017). and uses 
Golomb-Rice encoding (Golomb 1966) fr arbitrary duta streams 
(CCSDS 2020). Liu 2018) compared the general compression 
algorithm and the Golomb-Rice algorithm to compress the full- 
disk solar photosphere image, and their maximum compression 
ratios were 2.053 and 2.084, respectively. 

However, the above methods do not consider eliminating the 
inherent redundancy in prior knowledge, which is potential to 
significantly enhance data lossless compression efficiency. In 
this article, we address this issue by removing background 
information unrelated to the solar disk from the raw data, as 
Well as invariant features among consecutive frames and other 
forms of redundant prior knowledge. Since the background 
information is irrelevant 1o our research objectives and given 
that other forms of prior knowledge redundancy are reversible 
and can he reconstructed during decompression, we anticipate 
that this substantial reduction of redundancy will result in a 
marked improvement in lossless data compression perfor- 
mance, particularly suitable for MHI applications. 

‘Section 2 elaborates on the measurement mode of the solar 
magnetic field and Doppler velocity field for MHI, and 
proposes a compression strategy based on the measurement 
mode; Section 3 details the compression method: in onler to 
verify the superiority of the method, Section 4 compares the 
compression ratios of different compression methods, evaluates 
the compression time of our method through experiments, and 
discusses the factors affecting the effectiveness and the 
feasibility af spatialization: finally. Section $ 


2, Characteristic Analysis of MHI Data 


2.1. The Measurement Mode of the Solar Magnetic Field 
‘and Doppler Velocity Field for MHI 

Presently the solar magnetic field is mainly measured based 

on the Zeeman effect. MHI will cary out measurements at 

several wavelength positions around the Zeeman-sensitive 

Photospheric spectral line and its nearby continuum using 
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narrowband filters, and analyze the Stokes polarization 
parameters (7, Q. U, and V) of the incident light at cach 
spectral position using the polarization modulator, With these 
observations, we can derive the vector magnetic field. There- 
fore, the magnetic field measurement by a solar telescope is 
essentially a polarization measurement, that is, the measure- 
ment of Stokes parameters Q U and V- 

“The polarization measurement of MHI uses a differential 
mode. Taking the Q parameter as an example, the detected 
intensities PU+Q) and PU-Q) are obtained through 
polarization modulation. the difference between PU +O) and 
PU— O) is used to obtain the Q parameter, and the sum of 
them is J. In order to improve the signal-to-noise ratio, shart 
exposure. alternating sampling. and integration for PU + O) 
and PU Q) are applied (Ai & Hu 1986). The equation for 
obtaining Stokes Q. U, and Vis 
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After measuring the imaging intensity at multiple wave- 
length positions near the center of the working spectral line, the 
offset from the line center is calculated by fitting the spectral 
Contour at multiple wavelength points. The solar line-of-sight 
Doppler velocity can be obtained according to the proportional 
relationship between Doppler velocity and the drift of the line 
center (Zhang et al. 2007), 

‘The compression of original data for MHI is essentially 
compressing solar polarization data sets Piu, v. A, +AA. 
T4 S). u and v denote the indices for the width and height of the 
image, A. AA, 1, and J $ represent the center wavelength, 
the offset and polarization modes, respectively (Figure 1) 


2.2. Analysis of Compression Strategies from Prior 
Knowledge 


From our particular observation data. we extract five points 
of characteristics, which can be utilized to significantly 
improve the data compression efficiency. 


1 For full-disk solar images, the effective information is 
concentrated within the solar disk, and appropriate 
templates can be used to extract the information, thereby 
reducing overall data volume (Figure 21a) 

2. Given the large-scale hemispherical shape af full-disk 
intensity, a specific predictive coding can be devised to 
eliminate spatial redundancy and to lower the pixel 
intensity values of single-frame data (Figure 20). 

3. The weak intensity difference between PU +S) and P 
U- S) can be retained to achieve a superior compression 
effect (Figure X(C). 

4. For the same wavelength of QUV dara, the sum of PU +S) 
and PU- S) is approximately invariant: PU+ Q) + 
PU- O=PU+ U)+ PU U=PU+V)+ PV). 
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5. In Doppler velocity field observation, there exhibit large- 
scale intensity variations across different wavelengths 
Figure 214), 

Referred to compression theory, the above five character- 
isties can be summarized as background removal, elimination 
of spatial redundancy in a single frame, and elimination of both 
polarized and spectral redundancy in continuous frames, 


3. Design of Lossless Compression Method 

Figure 3 shows the whole process of compression and 
decompression. Fist, we design an extractor that can 
effectively remove most of the extemal background from the 
solar image, and then perform dimensionality reduction on the 
extracted solar information to facilitate subsequent processing 
Next, a secondary pre-pixel (SP) predictive coding ìs proposed 
to facilitate efficient predictive coding for spatial redundancy of 
individual frames. Taking advantage of the relationship 
between consecutive frames, the SP predictive coding 
combines with inter-frame differential predictive coding to 
minimize the intensity of each pixel. Finally, after designing a 
specifie mapping function, a finite length Golomb-Rice 
encoding is chosen to achieve lossless data compression (Rice 
& Plaunt 2003). Among them, all steps except for the 
extraction are reversible; during decompression, corresponding 
invertible operations are performed to restore the data, For the 
step involving filing. values outside the solar disk can be 
assigned to zero, thereby realizing lossless compression of the 
solar disk information. 
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3.1. Extraction and Dimensionality Reduction of Solar 
Disk Information 


As we know, the position of the solar disk in the image is 
fixed, and the largest inscribed circle in the full-disk solar 
image can be used as a binary template to extract information 
of the Sun and its outer edge, which can significantly reduce 
the data volume. 

Let the radius of the largest inscribed circle in image be Row 
then the extracted disk size Scu is TRÈ, the side length of the 
original square image is 2Rew, and the size ofthe square image 
Sinaga # 2K)? The compression ratio Rati, is calculated 
as'follows: 


Reut 4 
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“The compression ratio Rate is 1.27324, and compression 
rate Rate is 0.78540. It means the overall data size will be 
greatly decreased by 22-46%, while not affecting the 
spatiotemporal continuity of the compressed data. 

The remained information comprises two pars: information 
within the solar disk, and background on the ower edge of the 
solar disk in the form of a narrow ring (marked in blue in 
Figure 4). The information on the ring ensures the effectiveness 
of the proposed method even if the solar disk center deviates 
from the image center. 

Before subsequent predictive coding, the dimensionality 
reduction process is given as follows: first, choose the 
horizontal as the reference direction. Then, the circle is 
dimensionally reduced row by row along the reference 
direction. t can be expressed as Algorithm 1; 

Algorithm 1. Dimensionality Reduction (D-R) or Dimen- 
sonality Increase (D-I) 
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(©) The 3D shape of the intensity of the solar disk 


(A) Speetral intensity at diferent wavelength 


Figure 2. Pania prior knowl in MHL 


where P(x) indicates the data after dimensionality reduction, x 
denotes its indices, Tu, v) is the template defined by 


where [A] represents the calculation of the downward rounding 
for float A. 


4.2. Predictive Coding Design 


We design a predictive encoding algorithm for dimension- 
ality reduced data based om the measurement mode and data 
characteristics of MHI, as shown in Figure Iz For the first frame 
PO. + Adu 1+ Q). we utilize the SP predictive coding. For 
all frames of P? + AA. 15), we employ the inter-frame 
difference predictive coding between P (A, +AA. 14-8) and 
PAAA, £5) (RIED), For P+ AA 1+0) and 
P(X +AXL+ V) at the same wavelength position, we select 
the inter-frame differential predictive coding at the same 
wavelength (SUVL-IFD), For P (A, + AA. 7+ Q) except the 
fist frame P'O + Ado, 1+0). we apply the inter-frame 
differential predictive coding in spectral scanning (SQL-IFD). 
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42. The SP Predictive Coding 


The solar data contains large-scale spatial redundancy 
introduced by the limb darkening effect. instrument non- 
uniformity, ete. on the solar disk, and small-scale spatial 
redundancy introduced by fine structures such as granules, dark 
Janes, detectors local noise, cosmic ray noise, ete. Because of 
the high time cost of removing small-scale spatial redundancy 
Wwe only consider large-scale spatial redundancy. 


The classic forward pixel predictive coding can remove the 
spatial redundancy in the reference direction (horizontal) after 
dimensional reduction of the original image, but there is still 
spatial redundancy that can be removed in the other direction. 
in onler to remove this redundancy, the forward pixel 
predictive residual is filled back to the corresponding position 
of the binary template as a primary predictive residual matrix, 
which is subsequently transposed. Then. the transposed matrix 
is reduced in dimensionality again, keeping the reference 
direction unchanged. After the second dimensionality reduc- 
tion, the forward pixel predictive coding is performed again, 
and the final predictive residual is the SP predictive residual. 
The entire process is referred to as the SP predictive coding. 
and its calculation process is as follows: 


Step 1. PRW) = PG) — PG — 1) 


Step 2 Prelu, v) E Pte) 
Step 3. Pr(a, v) = (Petu, vi 
Peta, 


Step 4. Py (x) 


Step 5. PR) = APU) = PPa = 1) 


Jn the process, Paia) represents the primary pre-pixel 
predictive residual, P(x) serves as the second dimensionality 
reduction information, P(x) denotes the secondary pre-pisel 
predictive residual, Peru, v) stands for the filled data, and 
Palu, V) serves as the transposed data 

The SP predictive coding can be used both in the predictive 
coding for the first frame data and in the preprocessing step for 
subsequent operations. For convenience, the operation of SP is 
represented by ©, and ihe result afler the operation is 
represented by P(x). A dimensionality reduction image after 
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predicted by SP becomes: 
PA) = E PR) o 


Due t0 P+, 1+) being the first frame with no 
reference, its predictive coding can only be performed based on 
its own. Therefore, P™(A, + Ap, 1+ Q) can be expressed as: 

RO + AMI +O =S (PA + AIEO O 


where ROA, + Ady. 14O) is the residual after SP predictive 
coding. 


3.2.2 Inter Frame Differential Predictive Coding 


All polarization parameters in most areas of the solar disk are 
Weak. lt eam remove the spatiotemporal redundancy of 
P (LS) by using the inter-frame difference predictive coding 
between P (+ S) and P (— S) (RAED): 


RUS) = PU +S) POU- S) (S=. U.V) 


where RU — S) is the residual after RAPD predictive coding. 

In addition. the sum of PU +S) and PU — S) at the same 
wavelength is approximately equal. By applying the SP 
predictive coding and Equation (8), the spatiotemporal 
redundancy of P (1+ U) and P+ V) can be eliminated 
This is the inter-frame differential predictive coding at the same 
wavelength (SUVL-IFD). 


RU + U) = (PU + O) + PU N/A] 
-P30 + U) 

RU +V) = PSU + U) + PU DAL 

-Pu + v) wo 


where RUSU) RUV) are the resus of eliminating 
Spatiotemporal redundancy 

In fact that the P CA +AA 1+ Q) exhibits large-scale 
changes a diferent wavelength positions. we employ the SP 
predictive coding to smooth out the large-scale changes in 
image intensity and eliminate the influence of image jiter. 
Then, we normalize the intensity of the previous frame to the 
urent one, perform SP predictive coding on boih frames. and 
titferetate their results after predictive coding "This proces is 
defined as the inter-frame differencing predictive coding in 
spect scanning (SQLAIFD). How we eliminate the spatio- 
temporal redundancy of adjacent images in spectral scanning is 
as follows: 


TĀ 
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where F represents the calculation of the mean of matrix H, and 
J is the index of different wavelength. 
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3.3. Mapping and Encoding Design 


The mapped data in the article approximately follow a 
univariate geometric distribution. Considering the usa 
requirements fr the encoding algorithm in space projects (Le 
no patent issues, low complexity, and simple hardware 
implementation). the Golomb-Rice algorithm is selected for 
encoding. 

To employ the Golomb-Rice coding, an ideal source should 
follow a discrete geometric distribution (Merhav et al- 1998). 
Typically, the probability density distribution of the predicted 
residuals exhibits slight asymmetry. as shown in Figure 5. The 
vertical line through the peak point serves as the approximate 
symmetry axis: the long side extends to Pl the length from the 
penk point to PI is L1, and the blue area is MI; the short side 
extends to P2, the length from the peak point to P2 is L2, and 
the green area is AR. 

“The distribution in Figure 5 deviates from the ideal 
geometrie distribution, so i is necessary to map the existing 
predictive residual data set from a general normal form 1o an 
approximate discrete geometrie distribution. 

The final mapping equation is as follows-1 


-2—0)-1 R- 0)<0 MI>MI W- O< 12 


e-o tk} 90 aoa I- oS 2 
m-am- woco maea mw- osz 
-0-1 R-0)>0 NIEM IR-OSL 

je -o+ ua van vara R= 07> 2 


where A is the predictive residual, R'is the mapped result 
MI/M2 represents the cumulative probability density for 
residual data, which i calculated by integrating the probat 
density from the peak point to PI/P2. 

‘The entire lossless compression method can be summarized 
as following steps: 


1 Reduce all data using Equation (3). 
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2. Acconling to Figure 1. different data are predictively 
coded using corresponding methods. For details, refer to 
Equations (6)-(9) 

3. Map the predictive residual according to Equation (10), 
and apply finite-length Golomb-Rice coding to the result. 


4, Experiment 
41. Data Selection and Experimental Design 


The Solar Full-disk Multilayer Magnetograph (SEMM) at 
Ganyu station has a similar measurement mode and data 
characteristics as MHL Therefore, the SEMM's data are used in 
our experiments to test the compression method. A group of 
spectral scanning data by SEMM includes Stokes 7+0. 1 U. 

T V measurements at 6 wavelength positions, resulting in a 
total of 18 files, the bit depth of them ìs 32 bir pixel ', 

The characteristics of an image, its noise evel, and the 
compression method employed together determine the magni- 
tude of data compression ratio. Under routine scientific 
observation modes, MHI's data is not affected by cloud cover, 
hence, when the signal-to-noise ratio of the detector remains 
constant, the primary variable among image characteristics 
affecting the data compression ratio is the variation in the 
number of sunspots. To guarantee the robustness of the 
method, we select three groups of SEMM's data with different 
sunspot characteristics as the test set, including no sunspots, 
small sunspots, and sunspot groups. There are $4 files in total, 
cach containing two images, making it a total of 108 images 
The compression ratio of these files can serve as evidence for 
the universality of this method, 

The experiment is divided into three parts: 


1. Compress all the data in the test set and analyze whether 
the compression ratio meets the MHI's requirements. 

2. Compress one group of the data sets using multiple 
compression algorithms, and analyze whether the com- 
pression ratio in this article is superior to the general 
algorithms. 

3. Analyze whether the time consumption of our compres- 
Sion method satisfies the MHI’s real-time requirement. 


4.2. Result Analysis 
42.1. Analysis of Compression Ratio 


“The compression ratios for the three sets of spectral scanning 
data by our lossless method are shown in Figure 6, Regardless 
of the varying number of sunspots, a consistent tend is 
observed in the overall distribution of compression ratios, with 
an average ratio of approximately 3.67 and a standard deviation 
Jess than 0.11 (shown in Table 1). This indicates that not only 
does the method significantly surpass the SPO requirement of 
3.5, but it also exhibits small uctuatios, suggesting minimal 
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influence from the number of sunspots. Thus, the method 
demonstrates a strong universality 


4.2.2. Comparison with Other Lossless Compression 
Algorithms 


To demonstrate the compression performance of the method 
proposed in this article, we compared JPEG2000, which is 
widely used in spatial compression, the partion RICE 
compression algorithm taken by the Flexible Image Transport 
System (FITS), and the mature compression algorithms Izma? 
and RAR. The result of compressing the raw data (including 
solar disk and all background information) is shown in 
Figure 7 

From Figure 7, it is evident that among the alternative 
methods compared to ours, RAR has the lowest compression 
ratio, followed by RICE in FITS, LZMA2 compression, and 
{PEG2000, with their best compression ratio approaching 
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approximately 2.5. In contrast, our proposed method achieves 
the highest compression ratio. surpassing 3.5. Even if these 
conirisive methods were to concentrate exclusively on 
compressing the solar disk, according to Equation (2). their 
optimal compression ratio would- theoretically approach 
$1831, which is still less than 3.5. It demonstrates that our 
method is particularly more suitable fr lossless compression of 
the solar full-disk Stokes images in the spectral scanning mode. 
Note that our method has the same trend with others execpt for 
slightly big fluctuation. because we retained only the solar disk 
data during dimensionality reduction. 


4.23. Time Consumption of Compression Method 


Nowadays Loogson processors have been succesfully 
applied in space projects of China for onboard processing. and 
the performance of the latest Loongson 3A6000 processar (with 
a clock frequency of 2.0-2.5GHz) is similar to that of the 10th 
generation Intel Core processor. So in this paper we use a single 
core of the 10th generation Core i7 10750H processor instead of 
the Loogson 3600 processor for the time consumption testing, 
with Turbo Boost disabled and the peak frequency limited to no. 
more than 228 GHz. The test data set is from sunspot groups, 

Figure $ shows that the compression time for a pair of P 
U $) is less than 12s, A single observation cycle of the MHL 
Payload will last 15 minutes, and there will be a total of 18 
pairs of PUS). Therefore, using the compression method 
Proposed in this paper, the compression time i less than 216 
which is much shorter than the observation eyele of 1$ minutes, 
meeting the time requirements for MHI. 


5. Discussion and Conclusion 


In onder to address the urgent need for efficient data lossless 
compression in MHI. this article proposes a targeted lossless 
compression method based on the measurement mode and data 
characteristics, The main process and test results of our method 
are as follows: The background information outside the solar 
disk is extracted and dimensionality reduced by utilizing the 
largest inscribed circle within the image, which results in an 
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overall data volume reduction of mo less than 22.46%: 
subsequently, predictive coding is performed to eliminate all 
correlations (space. spectrum, and polarization) in solar data 
tube: a single frame is predicted using SP predictive coding, 
while the remaining frames are predicted using inter-frame 
difierential predictive encoding combined with SP predictive 
coding. After mapping and finite length Golomb-Rice encoding. 
the final average compression ratio is up 10 367, It is 
significantly improved compared with general lossless methods 
such as LZMA2, RAR, RICE in FITS, and JEPG2000. At the 
Same time, the average processing time of our method for a set 
‘of data is less than one observation period, meeting real-time 
Teguiements. Due to the limited capability of the acquisition 
device and processor, our method only involves data binariza- 
tion, matrix multiplication. transposition, differentiation, aray 
traversal, ete, The subsequent encoding is a mature scheme that 
has been in general use for many years. Therefore, the proposed 
method has strong feasibility, and can be quickly ported to the 
payload for testing its practical ime consumption and effect. 
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